Herramientas para la inteligencia artificial

Maestría en Inteligencia Artificial Aplicada

Trabajo académico final escrito¶

Genera una aplicación de inteligencia artificial que use librerías de software libre a través de herramientas colaborativas.

Aplicar herramientas y técnicas para la recolección, análisis y presentación de datos relacionados con la Inteligencia Artificial.

Identificar oportunidades para desarrollar y dirigir proyectos tecnológicos en campos específicos de la inteligencia artificial.

Planteamiento :

  1. Seleccionar una plataforma: Jupyter Lab / Google Colab / DeepNote

  2. Usar dos datasets, uno que tiene origen en un CSV y otro que está en una base de dato.

  3. Consumir la información de los datasets final a través de la librería Pana s.

  • Agregar 5 columnas al dataset, en función del contexto de losdatos
  1. Realizar visualizaciones a trv és de:
  • Matplotlib (2 visualz aciones)

  • Bokeh (2 visul izaciones)

  • Pywalker (2 visualizaciones)o web.

Integrantes - Grupo 8¶

  • Jairo Pillajo
  • Luis Estrada
  • Juan Yupangui

Punto 1: Seleccionar una plataforma¶

Seleccionamos Jupyter Lab para el presente trabajo debido a sus ventajas a la hora de exportar y compartir los resultados. Esta plataforma no solo permite crear y editar notebooks interactivos, sino que también ofrece varias opciones para la exportación de los documentos. Ademá b permite la integración con herramientas colaborativas y sistemas de control de versiones como Git, lo cual es esencial para trabajos en equios.

In [1]:
# Instalar librerias necesarias
!pip install pymysql
!pip install pandas
Requirement already satisfied: pymysql in c:\programdata\anaconda3\lib\site-packages (1.1.0)
Requirement already satisfied: pandas in c:\programdata\anaconda3\lib\site-packages (2.1.4)
Requirement already satisfied: numpy<2,>=1.23.2 in c:\programdata\anaconda3\lib\site-packages (from pandas) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\programdata\anaconda3\lib\site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\programdata\anaconda3\lib\site-packages (from pandas) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in c:\programdata\anaconda3\lib\site-packages (from pandas) (2023.3)
Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)

Punto 2: Usar datasets¶

Leemos dos origenes de datos: EL primer origen de datos corresponde a un archivo csv (clientes) El segundo esta en una base de datos mysql (tabla clientes_noticias)

In [2]:
# Carga de librerías
import pandas as pd
from sqlalchemy import create_engine
In [3]:
#Cargar archivo csv:
cli_df = pd.read_csv('data/customer.csv', encoding='latin-1', sep=';', dtype='str')

# Imprimir la informacion de clientes
print("Cantidad de clientes: ", cli_df.shape[0])
cli_df.head(5)
Cantidad de clientes:  862
Out[3]:
codi_clie tipo_ruc cli_nom cli_sector cli_fono fecha cli_prov cli_mail
0 9999999 01 CONCUJIDOR JINAL QUITO NaN NaN PICHINCHA NaN
1 0000393 04 JIRANDA IZURIITA CIRJIO ANIBAL LAGO AGRIO 2473475 27/5/2013 SUCUMBIOS NaN
2 0000658 05 ORIJUILA JORANTI CARLOC SANTO DOMING 997434424 9/10/2013 SANTO DOMINGO bespropoempoeto@yahoo.com
3 0000683 05 JARTINIZ JALDONADO JARIANA PUERTO LOPEZ 7968497664 28/10/2013 MANABI elesamarbonabo@yahoo.com
4 0000979 04 IL ARTICAN C-A- PUERTO LOPEZ 752377732 11/7/2014 MANABI beltataopalosanto@yahoo.e
In [4]:
#Lectura de base de datos mysql
# Crear  URI de conexión hacia MySQL
uri = "mysql+pymysql://u830706108_ia_g8:2024IA_g8*.*@193.203.166.22:3306/u830706108_ia_g8"
 
# Crear uri a SQLAlchemy
sql_engine = create_engine(uri)

# Senetencia SQL para consultar la data
sql = "SELECT * FROM sales"

# Cargar los datos en un DataFrame utilizando SQLAlchemy
sal_df = pd.read_sql(sql, sql_engine)

# Cerrar conexion hacia el uri
sql_engine.dispose()

# Imprimir la informacion de clientes_noticias
print("Cantidad de Ventas: ", sal_df.shape[0])
sal_df.head(5)
Cantidad de Ventas:  2058
Out[4]:
caja fact_anul fact_cred codi_clie codi_fact vcr_num_fac vcr_fecha vcr_suman vcr_iva vcr_tipo vcr_divisa usuario
0 004 FALSO FALSO 9999999 0000124526 0000001 2019-02-16 36.88 4.43 E USD 001
1 004 FALSO FALSO 9999999 0000124527 0000002 2019-02-16 2.09 0.25 E USD 005
2 004 FALSO FALSO 9999999 0000124538 0000003 2019-02-18 19.18 2.30 E USD 005
3 004 FALSO FALSO 9999999 0000124542 0000005 2019-02-18 1.44 0.17 E USD 005
4 004 FALSO FALSO 9999999 0000124553 0000013 2019-02-18 27.13 3.26 E USD 009

Punto 3: Unificar DataSet¶

Unificamos el DataFrame de CLientes y clientes_noticias mediante el atributo nit

In [5]:
##Funciones utilizadas
# Función para asignar tipo_telefono segun la longitud del numero cli_fono
def fnt_tipo_telefono(fono):
    if pd.isnull(fono) or len(str(fono)) == 0:
        return "Ninguno"
    elif 1 <= len(str(fono)) <= 9:
        return "Convencional"
    elif len(str(fono)) == 10:
        return "Celular"
    else:
        return "OT"

# Castea a SI y NO
def fnt_cast_boolean(valor):
    return {'VERDADERO': 'SI', 'FALSO': 'NO', 'SI': 'SI', 'NO': 'NO'}.get(valor)
In [6]:
##Tratamiento de datos de cli_df (coustumer)
from datetime import datetime        

# Aplicar la función a la columna cli_fono y crear una nueva columna con los resultados
cli_df['tipo_telefono'] = cli_df['cli_fono'].apply(fnt_tipo_telefono)
cli_df['cantidad_correo'] = cli_df['cli_mail'].fillna('').str.count('@')

# Convertir la columna 'fecha' al tipo datetime
cli_df['fecha'] = pd.to_datetime(cli_df['fecha'], errors='coerce', dayfirst=True)
# Remplazar valores NaN en 'fecha' con la fecha mínima
cli_df['fecha'].fillna(cli_df['fecha'].min(), inplace=True)
# Calcular la diferencia en años entre la fecha actual y columna 'fecha'
cli_df['antiguedad'] = (datetime.today() - cli_df['fecha']).dt.days // 365

# Reemplazar los valores nulos en 'tipo_ruc' con "OT"
cli_df['tipo_ruc'].fillna("OT", inplace=True)

# Eliminar los guiones '-' de los valores en 'cli_nom'
cli_df['cli_nom'] = cli_df['cli_nom'].str.replace('-', '')

# Borrar columnas no necesarias
cli_df.drop(columns=['cli_fono','cli_mail','fecha'], inplace=True)
cli_df.head(5)
Out[6]:
codi_clie tipo_ruc cli_nom cli_sector cli_prov tipo_telefono cantidad_correo antiguedad
0 9999999 01 CONCUJIDOR JINAL QUITO PICHINCHA Ninguno 0 10
1 0000393 04 JIRANDA IZURIITA CIRJIO ANIBAL LAGO AGRIO SUCUMBIOS Convencional 0 10
2 0000658 05 ORIJUILA JORANTI CARLOC SANTO DOMING SANTO DOMINGO Convencional 1 10
3 0000683 05 JARTINIZ JALDONADO JARIANA PUERTO LOPEZ MANABI Celular 1 10
4 0000979 04 IL ARTICAN CA PUERTO LOPEZ MANABI Convencional 1 9
In [7]:
##Tratamiento de datos de sal_df (sale)
# Casteamos valores utilizando funcion
sal_df['fact_anul'] = sal_df['fact_anul'].apply(fnt_cast_boolean)
sal_df['fact_cred'] = sal_df['fact_cred'].apply(fnt_cast_boolean)

# Eliminar columnas no necesarias de sale
sal_df.drop(columns=['codi_fact','vcr_num_fac'], inplace=True)

#Mostrar primeros resultados
sal_df.head(5)
Out[7]:
caja fact_anul fact_cred codi_clie vcr_fecha vcr_suman vcr_iva vcr_tipo vcr_divisa usuario
0 004 NO NO 9999999 2019-02-16 36.88 4.43 E USD 001
1 004 NO NO 9999999 2019-02-16 2.09 0.25 E USD 005
2 004 NO NO 9999999 2019-02-18 19.18 2.30 E USD 005
3 004 NO NO 9999999 2019-02-18 1.44 0.17 E USD 005
4 004 NO NO 9999999 2019-02-18 27.13 3.26 E USD 009
In [8]:
## Unificar Data Frames
# Unificar los DataFrames usando atributio codi_clie y agregamos columna origin
info_df = pd.merge(cli_df, sal_df, on='codi_clie', how='inner',indicator='origin')

# Mostrar el DataFrame unificado
print("Cantidad total: ", info_df.shape[0])
info_df.head(5)
Cantidad total:  2058
Out[8]:
codi_clie tipo_ruc cli_nom cli_sector cli_prov tipo_telefono cantidad_correo antiguedad caja fact_anul fact_cred vcr_fecha vcr_suman vcr_iva vcr_tipo vcr_divisa usuario origin
0 9999999 01 CONCUJIDOR JINAL QUITO PICHINCHA Ninguno 0 10 004 NO NO 2019-02-16 36.88 4.43 E USD 001 both
1 9999999 01 CONCUJIDOR JINAL QUITO PICHINCHA Ninguno 0 10 004 NO NO 2019-02-16 2.09 0.25 E USD 005 both
2 9999999 01 CONCUJIDOR JINAL QUITO PICHINCHA Ninguno 0 10 004 NO NO 2019-02-18 19.18 2.30 E USD 005 both
3 9999999 01 CONCUJIDOR JINAL QUITO PICHINCHA Ninguno 0 10 004 NO NO 2019-02-18 1.44 0.17 E USD 005 both
4 9999999 01 CONCUJIDOR JINAL QUITO PICHINCHA Ninguno 0 10 004 NO NO 2019-02-18 27.13 3.26 E USD 009 both
In [9]:
##Agregar Columnas de calculo
# Calcular el porcentaje de vcr_iva en relación a vcr_suman
info_df['porcentaje_iva'] = ((info_df['vcr_iva'] / info_df['vcr_suman']) * 100).round(0)
# Formatear la columna como porcentaje
info_df['porcentaje_iva'] = info_df['porcentaje_iva'].map("{:.2f}%".format)

#Sumar Venta + Impuesto
info_df['total'] = info_df['vcr_suman'] + info_df['vcr_iva']

info_df.head(5)
Out[9]:
codi_clie tipo_ruc cli_nom cli_sector cli_prov tipo_telefono cantidad_correo antiguedad caja fact_anul fact_cred vcr_fecha vcr_suman vcr_iva vcr_tipo vcr_divisa usuario origin porcentaje_iva total
0 9999999 01 CONCUJIDOR JINAL QUITO PICHINCHA Ninguno 0 10 004 NO NO 2019-02-16 36.88 4.43 E USD 001 both 12.00% 41.31
1 9999999 01 CONCUJIDOR JINAL QUITO PICHINCHA Ninguno 0 10 004 NO NO 2019-02-16 2.09 0.25 E USD 005 both 12.00% 2.34
2 9999999 01 CONCUJIDOR JINAL QUITO PICHINCHA Ninguno 0 10 004 NO NO 2019-02-18 19.18 2.30 E USD 005 both 12.00% 21.48
3 9999999 01 CONCUJIDOR JINAL QUITO PICHINCHA Ninguno 0 10 004 NO NO 2019-02-18 1.44 0.17 E USD 005 both 12.00% 1.61
4 9999999 01 CONCUJIDOR JINAL QUITO PICHINCHA Ninguno 0 10 004 NO NO 2019-02-18 27.13 3.26 E USD 009 both 12.00% 30.39

Punto 4: Visualizaciones¶

In [10]:
info_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2058 entries, 0 to 2057
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   codi_clie        2058 non-null   object  
 1   tipo_ruc         2058 non-null   object  
 2   cli_nom          2058 non-null   object  
 3   cli_sector       2026 non-null   object  
 4   cli_prov         2058 non-null   object  
 5   tipo_telefono    2058 non-null   object  
 6   cantidad_correo  2058 non-null   int64   
 7   antiguedad       2058 non-null   int64   
 8   caja             2058 non-null   object  
 9   fact_anul        2058 non-null   object  
 10  fact_cred        2058 non-null   object  
 11  vcr_fecha        2058 non-null   object  
 12  vcr_suman        2058 non-null   float64 
 13  vcr_iva          2058 non-null   float64 
 14  vcr_tipo         2058 non-null   object  
 15  vcr_divisa       2058 non-null   object  
 16  usuario          2058 non-null   object  
 17  origin           2058 non-null   category
 18  porcentaje_iva   2058 non-null   object  
 19  total            2058 non-null   float64 
dtypes: category(1), float64(3), int64(2), object(14)
memory usage: 307.8+ KB
In [11]:
# Importar
!pip install pygwalker==0.4.8
Requirement already satisfied: pygwalker==0.4.8 in c:\programdata\anaconda3\lib\site-packages (0.4.8)
Requirement already satisfied: appdirs in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (1.4.4)
Requirement already satisfied: arrow in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (1.2.3)
Requirement already satisfied: astor in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (0.8.1)
Requirement already satisfied: cachetools in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (4.2.2)
Requirement already satisfied: duckdb==0.10.1 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (0.10.1)
Requirement already satisfied: gw-dsl-parser==0.1.47 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (0.1.47)
Requirement already satisfied: ipython in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (8.20.0)
Requirement already satisfied: ipywidgets in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (8.1.2)
Requirement already satisfied: jinja2 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (3.1.3)
Requirement already satisfied: kanaries-track==0.0.4 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (0.0.4)
Requirement already satisfied: packaging in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (23.1)
Requirement already satisfied: pandas in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (2.1.4)
Requirement already satisfied: psutil in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (5.9.0)
Requirement already satisfied: pyarrow in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (14.0.2)
Requirement already satisfied: pydantic in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (1.10.12)
Requirement already satisfied: pytz in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (2023.3.post1)
Requirement already satisfied: requests in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (2.31.0)
Requirement already satisfied: segment-analytics-python==2.2.3 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (2.2.3)
Requirement already satisfied: sqlalchemy in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (2.0.25)
Requirement already satisfied: sqlglot>=19.0.0 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (23.11.2)
Requirement already satisfied: typing-extensions in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (4.9.0)
Requirement already satisfied: wasmtime==12.0.0 in c:\programdata\anaconda3\lib\site-packages (from gw-dsl-parser==0.1.47->pygwalker==0.4.8) (12.0.0)
Requirement already satisfied: backoff>=2.2.1 in c:\programdata\anaconda3\lib\site-packages (from kanaries-track==0.0.4->pygwalker==0.4.8) (2.2.1)
Requirement already satisfied: dateutils>=0.6.12 in c:\programdata\anaconda3\lib\site-packages (from kanaries-track==0.0.4->pygwalker==0.4.8) (0.6.12)
Requirement already satisfied: monotonic~=1.5 in c:\programdata\anaconda3\lib\site-packages (from segment-analytics-python==2.2.3->pygwalker==0.4.8) (1.6)
Requirement already satisfied: python-dateutil~=2.2 in c:\programdata\anaconda3\lib\site-packages (from segment-analytics-python==2.2.3->pygwalker==0.4.8) (2.8.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\programdata\anaconda3\lib\site-packages (from requests->pygwalker==0.4.8) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests->pygwalker==0.4.8) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\programdata\anaconda3\lib\site-packages (from requests->pygwalker==0.4.8) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in c:\programdata\anaconda3\lib\site-packages (from requests->pygwalker==0.4.8) (2024.2.2)
Requirement already satisfied: decorator in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (5.1.1)
Requirement already satisfied: jedi>=0.16 in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (0.18.1)
Requirement already satisfied: matplotlib-inline in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (0.1.6)
Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (3.0.43)
Requirement already satisfied: pygments>=2.4.0 in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (2.15.1)
Requirement already satisfied: stack-data in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (0.2.0)
Requirement already satisfied: traitlets>=5 in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (5.7.1)
Requirement already satisfied: colorama in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (0.4.6)
Requirement already satisfied: comm>=0.1.3 in c:\programdata\anaconda3\lib\site-packages (from ipywidgets->pygwalker==0.4.8) (0.2.2)
Requirement already satisfied: widgetsnbextension~=4.0.10 in c:\programdata\anaconda3\lib\site-packages (from ipywidgets->pygwalker==0.4.8) (4.0.10)
Requirement already satisfied: jupyterlab-widgets~=3.0.10 in c:\programdata\anaconda3\lib\site-packages (from ipywidgets->pygwalker==0.4.8) (3.0.10)
Requirement already satisfied: MarkupSafe>=2.0 in c:\programdata\anaconda3\lib\site-packages (from jinja2->pygwalker==0.4.8) (2.1.3)
Requirement already satisfied: numpy<2,>=1.23.2 in c:\programdata\anaconda3\lib\site-packages (from pandas->pygwalker==0.4.8) (1.26.4)
Requirement already satisfied: tzdata>=2022.1 in c:\programdata\anaconda3\lib\site-packages (from pandas->pygwalker==0.4.8) (2023.3)
Requirement already satisfied: greenlet!=0.4.17 in c:\programdata\anaconda3\lib\site-packages (from sqlalchemy->pygwalker==0.4.8) (3.0.1)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in c:\programdata\anaconda3\lib\site-packages (from jedi>=0.16->ipython->pygwalker==0.4.8) (0.8.3)
Requirement already satisfied: wcwidth in c:\programdata\anaconda3\lib\site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython->pygwalker==0.4.8) (0.2.5)
Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from python-dateutil~=2.2->segment-analytics-python==2.2.3->pygwalker==0.4.8) (1.16.0)
Requirement already satisfied: executing in c:\programdata\anaconda3\lib\site-packages (from stack-data->ipython->pygwalker==0.4.8) (0.8.3)
Requirement already satisfied: asttokens in c:\programdata\anaconda3\lib\site-packages (from stack-data->ipython->pygwalker==0.4.8) (2.0.5)
Requirement already satisfied: pure-eval in c:\programdata\anaconda3\lib\site-packages (from stack-data->ipython->pygwalker==0.4.8) (0.2.2)
In [12]:
# Importar libreria pygwalker
import pygwalker as pyg

Punto 4.1: Pywalker¶

In [13]:
# Graficar Antiguedad de de ser cliente
vis_spec = r"""{"config":[{"config":{"defaultAggregated":true,"geoms":["line"],"coordSystem":"generic","limit":-1,"timezoneDisplayOffset":0},"encodings":{"dimensions":[{"fid":"codi_clie","name":"codi_clie","basename":"codi_clie","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"tipo_ruc","name":"tipo_ruc","basename":"tipo_ruc","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_nom","name":"cli_nom","basename":"cli_nom","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_sector","name":"cli_sector","basename":"cli_sector","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_prov","name":"cli_prov","basename":"cli_prov","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"tipo_telefono","name":"tipo_telefono","basename":"tipo_telefono","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cantidad_correo","name":"cantidad_correo","basename":"cantidad_correo","semanticType":"quantitative","analyticType":"dimension","offset":0},{"fid":"antiguedad","name":"antiguedad","basename":"antiguedad","semanticType":"quantitative","analyticType":"dimension","offset":0},{"fid":"gw_mea_key_fid","name":"Measure names","analyticType":"dimension","semanticType":"nominal"}],"measures":[{"fid":"gw_count_fid","name":"Row count","analyticType":"measure","semanticType":"quantitative","aggName":"sum","computed":true,"expression":{"op":"one","params":[],"as":"gw_count_fid"}},{"fid":"gw_mea_val_fid","name":"Measure values","analyticType":"measure","semanticType":"quantitative","aggName":"sum"}],"rows":[{"fid":"gw_count_fid","name":"Row count","analyticType":"measure","semanticType":"quantitative","aggName":"sum","computed":true,"expression":{"op":"one","params":[],"as":"gw_count_fid"}}],"columns":[{"fid":"antiguedad","name":"antiguedad","basename":"antiguedad","semanticType":"quantitative","analyticType":"dimension","offset":0}],"color":[],"opacity":[],"size":[],"shape":[],"radius":[],"theta":[],"longitude":[],"latitude":[],"geoId":[],"details":[],"filters":[],"text":[]},"layout":{"showActions":false,"showTableSummary":false,"stack":"stack","interactiveScale":false,"zeroScale":true,"size":{"mode":"full","width":320,"height":200},"format":{},"geoKey":"name","resolve":{"x":false,"y":false,"color":false,"opacity":false,"shape":false,"size":false}},"visId":"gw_1XvG","name":"Antiguedad de Clientes"}],"chart_map":{},"workflow_list":[{"workflow":[{"type":"transform","transform":[{"key":"gw_count_fid","expression":{"op":"one","params":[],"as":"gw_count_fid"}}]},{"type":"view","query":[{"op":"aggregate","groupBy":["antiguedad"],"measures":[{"field":"gw_count_fid","agg":"sum","asFieldKey":"gw_count_fid_sum"}]}]}]}],"version":"0.4.8"}"""
pyg.walk(cli_df, spec=vis_spec)
WARNING: parse invoke code failed, This may affect feature of export code.
Box(children=(HTML(value='<div id="ifr-pyg-00061925094bf264QzsqGZ1Ygap4fj8E" style="height: auto">\n    <head>…
Loading Graphic-Walker UI...
Out[13]:
<pygwalker.api.pygwalker.PygWalker at 0x156f38e5390>
In [14]:
# Cantidad de clientes por provincia
vis_spec = r"""{"config":[{"config":{"defaultAggregated":true,"geoms":["bar"],"coordSystem":"generic","limit":-1,"timezoneDisplayOffset":0,"folds":["gw_count_fid"]},"encodings":{"dimensions":[{"fid":"codi_clie","name":"codi_clie","basename":"codi_clie","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"tipo_ruc","name":"tipo_ruc","basename":"tipo_ruc","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_nom","name":"cli_nom","basename":"cli_nom","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_sector","name":"cli_sector","basename":"cli_sector","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_prov","name":"cli_prov","basename":"cli_prov","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"tipo_telefono","name":"tipo_telefono","basename":"tipo_telefono","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cantidad_correo","name":"cantidad_correo","basename":"cantidad_correo","semanticType":"quantitative","analyticType":"dimension","offset":0},{"fid":"antiguedad","name":"antiguedad","basename":"antiguedad","semanticType":"quantitative","analyticType":"dimension","offset":0},{"fid":"gw_mea_key_fid","name":"Measure names","analyticType":"dimension","semanticType":"nominal"}],"measures":[{"fid":"gw_count_fid","name":"Row count","analyticType":"measure","semanticType":"quantitative","aggName":"sum","computed":true,"expression":{"op":"one","params":[],"as":"gw_count_fid"}},{"fid":"gw_mea_val_fid","name":"Measure values","analyticType":"measure","semanticType":"quantitative","aggName":"sum"}],"rows":[{"fid":"gw_count_fid","name":"Row count","analyticType":"measure","semanticType":"quantitative","aggName":"sum","computed":true,"expression":{"op":"one","params":[],"as":"gw_count_fid"}}],"columns":[{"fid":"cli_prov","name":"cli_prov","basename":"cli_prov","semanticType":"nominal","analyticType":"dimension","offset":0}],"color":[],"opacity":[],"size":[],"shape":[],"radius":[],"theta":[],"longitude":[],"latitude":[],"geoId":[],"details":[{"fid":"gw_mea_key_fid","name":"Measure names","analyticType":"dimension","semanticType":"nominal"}],"filters":[],"text":[]},"layout":{"showActions":true,"showTableSummary":false,"stack":"stack","interactiveScale":true,"zeroScale":true,"size":{"mode":"full","width":320,"height":200},"format":{},"geoKey":"name","resolve":{"x":false,"y":false,"color":false,"opacity":false,"shape":false,"size":false}},"visId":"gw_UmV1","name":"Clientes Por Provincia"}],"chart_map":{},"workflow_list":[{"workflow":[{"type":"transform","transform":[{"key":"gw_count_fid","expression":{"op":"one","params":[],"as":"gw_count_fid"}}]},{"type":"view","query":[{"op":"aggregate","groupBy":["cli_prov"],"measures":[{"field":"gw_count_fid","agg":"sum","asFieldKey":"gw_count_fid_sum"}]}]}]}],"version":"0.4.8"}"""
pyg.walk(cli_df, spec=vis_spec)
WARNING: parse invoke code failed, This may affect feature of export code.
Box(children=(HTML(value='<div id="ifr-pyg-000619250959fdeaKkQ1GIzn3dUfegTB" style="height: auto">\n    <head>…
Loading Graphic-Walker UI...
Out[14]:
<pygwalker.api.pygwalker.PygWalker at 0x156f35a0d90>

Punto 4.2: Matplotlib¶

In [15]:
import matplotlib.pyplot as plt
In [16]:
# Contar el número de clientes por provincia
clientes_por_provincia = info_df['cli_prov'].value_counts()

# Calcular el total de ventas por provincia
total_ventas_por_provincia = info_df.groupby('cli_prov')['vcr_suman'].sum()

# Crear el gráfico de dispersión
plt.figure(figsize=(10, 8))
plt.scatter(clientes_por_provincia, total_ventas_por_provincia, alpha=0.5)
plt.title('Relación entre Número de Clientes y Total de Ventas por Provincia')
plt.xlabel('Número de Clientes')
plt.ylabel('Total de Ventas')
plt.grid(True)
plt.show()
No description has been provided for this image
In [17]:
# Agrupar por usuario y caja, y contar el número de ventas
ventas_por_usuario_caja = info_df.groupby(['usuario', 'caja']).size().unstack(fill_value=0)

# Crear el gráfico de barras
ventas_por_usuario_caja.plot(kind='bar', stacked=True, figsize=(10, 6))

# Configuración del título y etiquetas de los ejes
plt.title('Cantidad de Ventas por Usuario y Caja')
plt.xlabel('Usuario')
plt.ylabel('Cantidad de Ventas')

# Mostrar la leyenda
plt.legend(title='Caja')

# Mostrar el gráfico
plt.show()
No description has been provided for this image

Punto 4.3: bokeh¶

In [18]:
# Instalar bokeh
!pip install bokeh
Requirement already satisfied: bokeh in c:\programdata\anaconda3\lib\site-packages (3.3.4)
Requirement already satisfied: Jinja2>=2.9 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (3.1.3)
Requirement already satisfied: contourpy>=1 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (1.2.0)
Requirement already satisfied: numpy>=1.16 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (1.26.4)
Requirement already satisfied: packaging>=16.8 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (23.1)
Requirement already satisfied: pandas>=1.2 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (2.1.4)
Requirement already satisfied: pillow>=7.1.0 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (10.2.0)
Requirement already satisfied: PyYAML>=3.10 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (6.0.1)
Requirement already satisfied: tornado>=5.1 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (6.3.3)
Requirement already satisfied: xyzservices>=2021.09.1 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (2022.9.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\programdata\anaconda3\lib\site-packages (from Jinja2>=2.9->bokeh) (2.1.3)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\programdata\anaconda3\lib\site-packages (from pandas>=1.2->bokeh) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\programdata\anaconda3\lib\site-packages (from pandas>=1.2->bokeh) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in c:\programdata\anaconda3\lib\site-packages (from pandas>=1.2->bokeh) (2023.3)
Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from python-dateutil>=2.8.2->pandas>=1.2->bokeh) (1.16.0)
In [19]:
# Importar las bibliotecas necesarias
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.transform import factor_cmap
from bokeh.palettes import Spectral6
In [20]:
## bokeh ->Total Ventas por mes
# Configura Bokeh para mostrar gráficos directamente en el notebook
output_notebook()  

# Convertir la columna vcr_fecha a datetime
info_df['vcr_fecha'] = pd.to_datetime(info_df['vcr_fecha'])

 # Creamos una nueva columna que solo contiene el año y el mes
info_df['year_month'] = info_df['vcr_fecha'].dt.to_period('M') 

# Agrupar por año y mes, y calcular el total de ventas
ventas_por_mes = info_df.groupby('year_month')['total'].sum().reset_index() 
ventas_por_mes['year_month'] = ventas_por_mes['year_month'].dt.to_timestamp()

# Crear una fuente de datos para Bokeh
source = ColumnDataSource(ventas_por_mes) 

# Crear el gráfico de líneas
p = figure(height=400, width=800, title="Total de Ventas por Mes",  # Dimensiones y título especificados
           x_axis_type='datetime', toolbar_location=None, tools="")  # Eje x como tipo datetime y desactivamos las herramientas predeterminadas

# Creamos una herramienta de hover para mostrar tooltips al pasar el cursor sobre los datos
hover = HoverTool(  
    tooltips=[
        ("Mes", "@year_month{%F}"),  # Tooltip para mostrar el mes en formato de fecha completa
        ("Total Ventas", "@total{0.00}")  # Total de ventas con dos decimales
    ],
    formatters={
        '@year_month': 'datetime',  #  '@year_month' debe formatearse como una fecha
    }
)

# Añadimos la herramienta de hover a la figura
p.add_tools(hover)  

# Dibujamos una línea que conecta los puntos de datos
p.line(x='year_month', y='total', source=source, line_width=2, color='navy')  

# Dibujamos círculos en cada punto de datos para resaltarlos
p.circle(x='year_month', y='total', size=5, source=source, fill_color="white", color='navy')  

# Etiqueta para el eje x, y
p.xaxis.axis_label = 'Mes'  
p.yaxis.axis_label = 'Total de Ventas'

# Mostramos la figura en el notebook
show(p)
Loading BokehJS ...
In [21]:
## bokeh ->Facturas anuladas por mes
# Convertir la columna vcr_fecha a datetime
info_df['vcr_fecha'] = pd.to_datetime(info_df['vcr_fecha'])

# Crear una columna para el año y mes
info_df['year_month'] = info_df['vcr_fecha'].dt.to_period('M')

# Contar las facturas anuladas y no anuladas por mes
facturas_anuladas_por_mes = info_df.groupby(['year_month', 'fact_anul'])['fact_anul'].count().unstack(fill_value=0).reset_index()

# Convertir los períodos a cadenas de texto para el rango en el eje x
facturas_anuladas_por_mes['year_month'] = facturas_anuladas_por_mes['year_month'].astype(str)

# Crear una fuente de datos para Bokeh
source = ColumnDataSource(facturas_anuladas_por_mes)

# Crear el gráfico de barras
p = figure(height=400, width=800, title="Facturas Anuladas por Mes",
           x_range=facturas_anuladas_por_mes['year_month'], toolbar_location=None, tools="")

# Añadir las barras de facturas anuladas y no anuladas
p.vbar_stack(['SI', 'NO'], x='year_month', width=0.9, color=['red', 'green'], source=source,
             legend_label=['Anuladas', 'No Anuladas'])

# Configurar los detalles del gráfico
p.xaxis.axis_label = 'Mes'
p.yaxis.axis_label = 'Número de Facturas'
p.legend.title = 'Estado de la Factura'

# Mostrar la grafica
show(p)
In [22]:
# GUardar csv resultante
info_df.to_csv('data/salesForCustomerResult.csv', sep=';', index=False)
In [ ]: